OntoMate: a text-mining tool aiding curation at the Rat Genome Database

نویسندگان

  • Weisong Liu
  • Stanley J. F. Laulederkind
  • G. Thomas Hayman
  • Shur-Jen Wang
  • Rajni Nigam
  • Jennifer R. Smith
  • Jeff de Pons
  • Melinda R. Dwinell
  • Mary Shimoyama
چکیده

The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using ODIN for a PharmGKB revalidation experiment

The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usa...

متن کامل

TADB 2.0: an updated database of bacterial type II toxin–antitoxin loci

TADB2.0 (http://bioinfo-mml.sjtu.edu.cn/TADB2/) is an updated database that provides comprehensive information about bacterial type II toxin-antitoxin (TA) loci. Compared with the previous version, the database refined and the new data schema is employed. With the aid of text mining and manual curation, it recorded 6193 type II TA loci in 870 replicons of bacteria and archaea, including 105 exp...

متن کامل

Integrating text mining into the MGI biocuration workflow

A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural languag...

متن کامل

Using the NCBO Web Services for Concept Recognition and Ontology Annotation of Expression Datasets

To provide enhanced access to expression datasets housed in the NCBI’s Gene Expression Omnibus database and to enable new opportunities for data mining we are using the NCBO’s Open Biomedical Annotator service to identify concepts and ontology terms in GEO records. Based on this first pass annotation we are curating these datasets using a variety of ontologies covering concepts of relevance to ...

متن کامل

The Rat Genome Database, update 2007—Easing the path from disease to data and back again

The Rat Genome Database (RGD, http://rgd.mcw.edu) is one of the core resources for rat genomics and recent developments have focused on providing support for disease-based research using the rat model. Recognizing the importance of the rat as a disease model we have employed targeted curation strategies to curate genes, QTL and strain data for neurological and cardiovascular disease areas. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2015  شماره 

صفحات  -

تاریخ انتشار 2015